# High-resolution Processing

Auramask Ensemble Moon
Gpl-3.0
This model employs an improved VNet architecture for 2D image processing, focusing on image-to-image translation tasks with adversarial and aesthetic optimization features.
Image Generation
A
logasja
17
0
C RADIOv2 G
Other
C-RADIOv2 is a visual feature extraction model developed by NVIDIA, offering multiple specification versions suitable for image understanding and dense processing tasks.
Transformers
C
nvidia
648
11
Vit So400m Patch14 Siglip Gap 384.webli
Apache-2.0
Vision Transformer model based on SigLIP, utilizing global average pooling for image features
Image Classification Transformers
V
timm
96
0
Vit Base Patch16 Siglip 512.webli
Apache-2.0
Vision Transformer model based on SigLIP architecture, containing only the image encoder part, using original attention pooling mechanism
Image Classification Transformers
V
timm
702
0
Vit Base Patch16 Siglip 256.webli I18n
Apache-2.0
ViT-B-16 vision Transformer model based on SigLIP, containing only the image encoder, utilizing raw attention pooling
Image Classification Transformers
V
timm
16
0
Convnext Large Mlp.clip Laion2b Ft Soup 320
Apache-2.0
ConvNeXt-Large image encoder based on CLIP architecture, fine-tuned on the LAION-2B dataset, supporting 320x320 resolution image feature extraction
Image Classification Transformers
C
timm
173
0
Dust3r ViTLarge BaseDecoder 512 Dpt
DUSt3R is a model for easily achieving geometric 3D vision from images, capable of reconstructing 3D scenes from single or multiple images.
3D Vision
D
naver
46.93k
14
Vit L 14 336
MIT
Large-scale vision-language model based on Vision Transformer architecture, supporting zero-shot image classification tasks
Image Classification
V
asakhare
20
0
Artwork Scorer
Apache-2.0
This model is a fine-tuned version based on Facebook's ConvNeXtV2 architecture, specifically trained for multi-label classification tasks on Pixiv ranking images
Image Classification Transformers
A
Muinez
32
5
Eva02 Enormous Patch14 Clip 224.laion2b S4b B115k
MIT
Large-scale vision-language model based on EVA02 architecture, supporting zero-shot image classification tasks
Text-to-Image
E
timm
130
1
Eva02 Large Patch14 Clip 336.merged2b S6b B61k
MIT
EVA02 is a large-scale vision-language model based on the CLIP architecture, supporting zero-shot image classification tasks.
Text-to-Image
E
timm
15.78k
0
Vit Large Patch16 224
Apache-2.0
Large-scale image classification model based on Transformer architecture, pre-trained and fine-tuned on ImageNet-21k and ImageNet-1k datasets
Image Classification
V
google
188.47k
30
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase